Conversation
for more information, see https://pre-commit.ci
|
@jacobbieker, I tried finding some files for this, but they were heavy for testing. Would you mind if I showed you a small batch of those files as output, so that I could know if this matches your vision? |
jacobbieker
left a comment
There was a problem hiding this comment.
There needs to be some more changes for this, but thanks for the nice first step on this.
As for the test files, I think a good compromise would be to have some integration tests that pull real, historical NNJA BUFR files, the corresponding NNJA-AI representation, does the processing and confirms they match everywhere. This can be marked with pytest.mark.skip to skip in GitHub CI, but then I can run it locally and see if they match exactly.
For this, I would also probably cut down the PR to a single one, just the ADPUPA for now. It makes it simpler to review and make sure the setup works correctly with the real files.
|
|
||
| source_name = "ADPUPA" | ||
|
|
||
| def _build_mappings(self): |
There was a problem hiding this comment.
Is that the only data in the NNJA-AI ADPUPA? I thought there were more variables
There was a problem hiding this comment.
I just wrote the basic ones and actually wanted to ask you that while I was leaning towards the primary descriptors when I loaded conv-adpupa-NC002001 for reference.
| source_name = "CrIS" | ||
|
|
||
| def _build_mappings(self): | ||
| self.field_mappings = { |
There was a problem hiding this comment.
This should definitely have more I think?
There was a problem hiding this comment.
Why is this being moved and renamed? Seems unnecessary
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Pull Request
Description
This PR adds a new BUFR processor module to enable reading and decoding NOMADS BUFR files into the NNJA-AI-compatible Parquet format.
The processor is designed to:
Decode BUFR messages using ecCodes (via the Python bindings)
Convert decoded data to the NNJA-AI archive schema, enabling seamless integration with existing workflows
Support decoding of initial high-priority observation types:
ADPUPA (upper-air soundings)
CrIS and IASI hyperspectral soundings
Serve as a modular component, so it can later be split into a dedicated repo for broader operational use.
Fixes #170
How Has This Been Tested?
Added pytest folder test which:
Reads a BUFR file from the NNJA archive
Converts it to Parquet
Compares the output with a reference NNJA-AI Parquet file
Passes if schemas and values match exactly
Yes
If your changes affect data processing, have you plotted any changes? i.e. have you done a quick sanity check?
Checklist: